The Level of Residual Dispersion Variation and the Power of Differential Expression Tests for RNA-Seq Data
نویسندگان
چکیده
RNA-Sequencing (RNA-Seq) has been widely adopted for quantifying gene expression changes in comparative transcriptome analysis. For detecting differentially expressed genes, a variety of statistical methods based on the negative binomial (NB) distribution have been proposed. These methods differ in the ways they handle the NB nuisance parameters (i.e., the dispersion parameters associated with each gene) to save power, such as by using a dispersion model to exploit an apparent relationship between the dispersion parameter and the NB mean. Presumably, dispersion models with fewer parameters will result in greater power if the models are correct, but will produce misleading conclusions if not. This paper investigates this power and robustness trade-off by assessing rates of identifying true differential expression using the various methods under realistic assumptions about NB dispersion parameters. Our results indicate that the relative performances of the different methods are closely related to the level of dispersion variation unexplained by the dispersion model. We propose a simple statistic to quantify the level of residual dispersion variation from a fitted dispersion model and show that the magnitude of this statistic gives hints about whether and how much we can gain statistical power by a dispersion-modeling approach.
منابع مشابه
Investigating the Function of Predicted Proteins from RNA-Seq Data in Holstein and Cholistani Cattle Breeds
This study was performed to determine the digital expression profile of different genes expressed in Holstein and Cholistani breeds as well as to evaluate the performance of predicted proteins derived from differentially expressed genes between these two breeds using RNA-Seq data. For this purpose, the whole mRNA sequence for a blood sample of American Holstein and Pakistani Cholistani cattle p...
متن کاملRegulatory effects of cis- and trans-LncRNAs on differential expression of genes following infection with viral hemorrhagic septicemia virus in rainbow trout (Oncorhynchus mykiss)
In this study the cis and trans regulatory effect of long non-coding genes (lncRNA) on the expression of genes in fish infected by Viral hemorrhagic septicemia virus (VHS) was investigated using RNA-seq technology. At the end of experimental period (the thirty fifth day), total RNA was extracted from spleen tissue (group treated with virus) and physiological serum (control group) was used to pr...
متن کاملPower analysis and sample size estimation for RNA-Seq differential expression.
It is crucial for researchers to optimize RNA-seq experimental designs for differential expression detection. Currently, the field lacks general methods to estimate power and sample size for RNA-Seq in complex experimental designs, under the assumption of the negative binomial distribution. We simulate RNA-Seq count data based on parameters estimated from six widely different public data sets (...
متن کاملIdentification of soybean circular RNAs in response to low nitrogen and phosphorus stress
Soybean, one of the most important sources of edible oil and protein in the world, is exposed to various environmental biotic and abiotic stresses. These stresses can negatively impact the quality and quantity of soybean production. This study aimed to identify genes that express circular RNAs in response to low phosphorus and nitrogen stresses in soybean roots. Soybean seeds were grown under d...
متن کاملDifferential analyses with DSS
This vignette introduces the use of the Bioconductor package DSS (Dispersion Shrinkage for Sequencing data), which is designed for differential analysis based on high-throughput sequencing data. It performs differential expression analyses for RNA-seq, and differential methylation analyses for bisulfite sequencing (BS-seq) data. The core of DSS is a procedure based on Bayesian hierarchical mode...
متن کامل